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Abstract 

Inspired  by  the  process  by  which  ants  gradually  optimize  their  foraging  trails,  this  report  investigates  the  cooperative  solution 
of  a  class  of  free-final  time,  partially-constrained  final  state  optimal  control  problems  by  a  group  of  dynamic  systems.  A  class 
of  cooperative,  pursuit-based  algorithms  are  proposed  for  finding  optimal  solutions  by  iteratively  optimizing  an  initial  feasible 
control.  The  proposed  algorithms  require  only  short-range,  limited  interactions  between  group  members,  avoid  the  need  for 
a  “global  map”  of  the  environment  on  which  the  group  evolves,  and  solve  an  optimal  control  problem  in  “small”  pieces,  in  a 
manner  which  will  be  made  precise.  The  performance  of  the  algorithms  is  illustrated  in  a  series  of  simulations  and  laboratory 
experiments. 


Key  words:  Co-operative  control,  Optimization,  Algorithms,  Agents,  Group  work,  Trajectories,  Minimum-time  control 


1  Introduction 

In  recent  years,  problems  in  cooperative  control  are  in¬ 
creasingly  capturing  the  attention  of  researchers,  fueled 
by  the  development  of  decentralized  control  systems 
with  cost  and  performance  advantages.  The  rising  in¬ 
terest  in  deploying  cooperative  systems  also  stems  from 
their  potential  to  perform  tasks  that  are  not  feasible  for 
individuals.  Examples  include  remote  exploration  and 
information  gathering  by  swarms  of  small  autonomous 
robots  [1],  and  satellite  arrays,  to  name  a  few.  Members 
of  such  “engineered  collectives”  usually  have  -  just  like 
their  natural  counterparts  -  limited  sensing,  communi¬ 
cation  and  computing  capabilities.  This  suggests  that 
each  member  can  only  perform  relatively  simple  tasks. 
However,  individual  limitations  can  often  be  overcome 
by  cooperation,  if  one  can  identify  an  effective  way 
to  organize  the  group  into  “more  than  the  sum  of  its 
parts”.  Doing  so  may  be  difficult  because  it  requires 
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decomposing  a  desired  group  behavior  into  individual 
behaviors.  The  results  however,  can  be  spectacular,  as 
is  often  demonstrated  by  biological  collectives.  For  ex¬ 
ample,  a  school  of  fish  can  coordinate  their  movement  in 
a  tight  formation  and  respond  almost  as  fast  as  a  single 
organism  to  evade  encountering  dangers;  worker  honey 
bees  share  information  by  “dancing”  and  distribute 
themselves  among  nectar  sources  in  accordance  with 
the  profitability  of  each  source;  ants  are  known  to  uti¬ 
lize  pheromone  secretions  for  recruiting  nest-mates  and 
for  optimizing  their  foraging  trails  [4].  Observations  of 
such  activities  in  nature  have  already  seeded  a  variety 
of  research,  from  modeling  of  animal  group  behaviors 
[4,2,15,9],  to  distributed  collective  covering  and  search¬ 
ing  [16,12],  cooperative  estimation  [13,10],  cooperative 
robotic  teams  [6,17,11]  and  biologically- motivated  opti¬ 
mization  [5,3]. 


A  particularly  interesting  example  of  cooperation  in  nat¬ 
ural  animal  aggregates  has  to  do  with  the  foraging  activ¬ 
ity  of  ant  colonies.  Ants  recruit  their  co-workers  to  con¬ 
vey  food  back  to  the  nest  when  they  find  it.  Finding  an 
efficient  (short)  path  between  the  nest  and  food  source 
appears  to  be  too  complicated  for  individual  ants  to  ac¬ 
complish,  considering  their  limited  cognition  and  size 
relatively  to  the  obstacles  in  the  environment,  including 
stones,  sticks  and  crevices.  Nonetheless,  a  colony  of  ants 
exhibit  a  high  degree  of  competence  in  such  tasks  [4] . 


Several  models  have  been  proposed  in  the  attempt  to 
capture  the  organizing  principle  by  which  ants  find 
shortest  paths  when  foraging.  For  example,  [4]  described 
a  model  based  on  the  use  of  pheromonal  secretions  that 
help  ants  choose  trails.  Briefly,  pheromonal  secretions 
are  laid  along  the  paths  by  ants  to  recruit  nestmates 
and  to  indicate  the  frequency  of  use  for  that  path.  In¬ 
spired  by  that  model,  [16]  developed  robust  adaptive 
algorithms  to  perform  tasks  requiring  the  traversal  of 
an  unknown  region,  such  as  cleaning  the  floor  of  an 
unmapped  building;  [5]  introduced  a  search  methodol¬ 
ogy  based  on  the  “distributed  autocatalytic  process” 
to  solve  a  classical  optimization  problem,  the  traveling 
salesman  problem. 

A  particularly  simple  -  but  elegant  -  ant  colony  orga¬ 
nizing  rule  was  presented  in  [2] ,  where  it  was  shown  that 
ants  that  “pursued”  one  another  on  M2  (each  pointing 
its  velocity  vector  towards  a  predecessor)  had  the  ef¬ 
fect  of  producing  progressively  “straighter”  trails.  That 
idea  was  later  extended  to  path  optimization  problems 
involving  kinematic  vehicles  in  non-Euclidean  environ¬ 
ments  [7,8]. 

Although  local  pursuit  was  inspired  from  observations  of 
ant  colonies  and  applied  to  other  engineered  collectives, 
the  last  works  in  [2,7]  dealt  exclusively  with  the  “discov¬ 
ery”  of  geodesics,  meaning  that  the  autonomous  system- 
members  of  the  group  had  simple  dynamics  (x  =  u)  with 
no  drift  terms.  In  [8],  it  was  shown  that  the  earlier  work 
could  be  generalized  to  a  much  broader  class  of  optimal 
control  problems,  and  collectives  whose  members  have 
non-trivial  dynamics.  The  proposed  algorithm,  termed 
“local  pursuit”  (to  use  the  term  coined  in  [2]),  guides 
members  of  a  group  toward  the  solution  of  an  optimal 
control  problem.  However,  the  algorithms  presented  in 
[8]  were  restricted  to  problems  with  fixed  final  time  and 
fixed  final  states.  This  report  explores  a  modified  version 
of  local  pursuit  for  solving  a  broader  and  more  interest¬ 
ing  class  of  optimal  control  problems  with  free  final  time 
and  partially-constrained  final  states. 

Under  our  proposed  control  strategy,  members  of  the  col¬ 
lective  do  not  need  a  global  map  of  their  environment  or 
even  an  agreed-upon  common  coordinate  system.  Thus, 
powerful  sensing  and  mass  information  exchanging  are 
not  needed,  neither  is  the  computation  over  “long”  dis¬ 
tances.  This  makes  the  proposed  algorithms  most  use¬ 
ful  in  trajectory  optimization  problems  which  are  easier 
to  solve  when  boundary  conditions  are  “close”  to  one 
another  (because  of,  for  example,  the  members’  compu¬ 
tational  or  sensing  limitations),  with  the  term  “close” 
taken  to  include  not  only  geographical  separation  but 
also  distance  on  the  manifold  on  which  copies  of  a  dy¬ 
namical  system  evolve. 

The  remainder  of  this  report  is  organized  as  follows:  Sec¬ 
tion  2  describes  the  optimal  control  problems  to  be  ad¬ 


dressed  and  proposes  an  iterative  algorithm  that  is  ap¬ 
propriate  for  a  group  of  cooperating  dynamical  systems. 
Section  3  discusses  the  main  results  concerning  the  per¬ 
formance  of  the  proposed  algorithm.  Section  4  presents 
a  series  of  simulations  and  laboratory  experiments  that 
illustrate  our  approach. 

2  A  bio-inspired  algorithm  for  optimal  control 

We  are  interested  in  the  solution  of  optimal  control  prob¬ 
lems  using  a  group  of  cooperating  “agents”.  The  term 
“agent”  will  refer  to  a  member  of  a  group  of  dynamical 
systems,  each  taken  to  be  a  copy  of: 

xk  =  f(xk,uk),  xk(t)  G  Rn,uk(t)  G  U  C  Mm  (1) 

for  k  =  0, 1,  2  . . ..  Physically,  each  copy  of  (1)  could  stand 
for  a  robot,  UAV  or  other  autonomous  system. 

2.1  Problem  Statement  and  Notation 

The  problem  under  consideration  is: 

Problem  1  Find  a  trajectory  x*(t),  a  final  time  T*  >  0 
and  a  final  state  x*(T*)  that  minimize 

rto+T 

J(x,x,t0)=  /  g(x,x)dt  +  F(x(to  +  T))  (2) 

Jt0 

subject  to  the  constraints  x(to)  =  xo  and  Q(x(to  +  T))  = 
0, 

where  it  is  assumed  that  g(x(t),x(t))  >  0,  F(x(to  + 
T))  >0  and  that  Q(-)  is  an  algebraic  function  of  the 
state. 

Definition  1  Given  the  final  state  constraint  Q(x)  =  0, 
the  constraint  set  of  x  is 

Sq  -  {x\Q(x)  =  0}. 


The  function  F(x)  in  (2)  will  be  taken  to  be  of  the  form: 

F(x)  if  x  G  Sq 
0  if  x  ^  Sq 

with  F(x)  >  0,Vx  G  Sq.  Problem  1  involves  optimal 
control  with  free  final  time  and  partially-constrained  fi¬ 
nal  state.  Fixed  final  state  problems,  where  Sq  is  a  sin¬ 
gle  state  [14,8],  are  special  cases  of  what  are  considered 
here. 

For  any  pair  of  fixed  states  a,  b  G  B  C  Mn,  let  x*(t) 
denote  the  optimal  trajectory  from  a  to  b  with  free  final 
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time  (minimizing  J  with  respect  to  x  and  T  only).  The 
corresponding  optimal  final  time  is  T*(a,  b).  The  cost  of 
following  x*  is  denoted  as: 

rto+F* 

rj(a,b,t0)  =  /  g(x*,x*)dt  +  G(x*(t0  +  T*)) 

Jt0 

=  min  J(x,  x,  to)  (3) 

x,r 

subject  to  x(to)  =  a,  x(to  +  T)  =  b. 

Now,  let  x*(t)  be  the  optimal  trajectory  from  an  initial 
state  a  to  the  constraint  set  Sq,  and  let  Yq(a,  Sq)  be 
the  corresponding  optimal  final  time  from  a  to  Sq  .  The 
cost  of  following  x*  is  denoted  by 

VQ(a,to)=  /  g(x*,x*)dt  +  G(x*{t0  +  T*Q)) 

Jt0 

=  min  J(x,  x,  to)  (4) 

x,rQ 

subject  to  x(to)  =  a,  Q(x(to  +  Tg))  =  0. 

The  cost  of  following  a  generic  trajectory  x(t)  of  (1) 
during  [to,  to  +  cr)  is  denoted  by: 

rt0+<j 

C(x,to,cr)  =  /  g(x,x)dt  +  G(x(t0  +  cr))  (5) 

Jto 

The  following  facts  can  be  derived  easily  from  the  prop¬ 
erties  of  optimal  trajectories  and  will  be  helpful  in  the 
sequel: 

Fact  2  Let  g,gQ,C  as  defined  in  (3),  (4),  (5),  and  let 
xk(t)  be  a  generic  trajectory  of  (1).  Then,  the  following 
hold : 

(1)  rj(a,b,t0)  <  C(xk,t0,Y)  for  any  xk(-) 
with  xk(t0)  =  a ,  xk(to  +  T)  =»  b. 

(2)  7 j(a,  c,  to)  <  g(a,  b,  t0)  +  rj(b,  c,  t0  +  cr) 
with  cr  =  T*(a,  b). 

(3)  rjQ(a,to)  <  rj(a,b,t0)  for  any  b  e  Sq. 

2.2  Algorithm 

Assume  that  there  is  available  an  initial  feasible  (but 
suboptimal)  control/trajectory  pair  (ufeas(t),Xfeas(t)) 
for  (1),  obtained  through  a  combination  of  a-priori 
knowledge  about  the  problem  and/or  random  explo¬ 
ration.  Following  the  idea  in  [2,8],  the  agents  are  sched¬ 
uled  to  leave  the  initial  state  x$  sequentially  and  pursue 
one  another  towards  the  set  Sq ,  in  a  way  which  will  be 
made  precise  shortly.  The  sequence  is  initiated  with  the 
first  agent  following  Xfeas  to  reach  a  point  in  Sq.  Each 
subsequent  agent  will  attempt  to  intercept  its  prede¬ 
cessor  -  along  optimal  trajectories  defined  by  (3)  -  if 
the  predecessor  has  not  reached  its  final  state  in  Sq.  If 


the  predecessor  has  already  reached  the  constraint  set 
Sq ,  then  the  pursuer  ignores  the  preceding  agent  and 
instead  evolves  along  the  optimal  trajectory  defined  by 

(4).  The  precise  rules  that  govern  the  movement  of  each 
agent  are: 

Algorithm  1  (Modified  Continuous  Local  Pursuit): 
Identify  the  starting  state  xo  on  D  and  the  constraint 
set  Sq.  Let  xo(t)  ( t  E  [0, To])  be  an  initial  trajectory 
satisfying  (1)  with  x o(0)  =  xo,  Q(xo(Tq))  =  0.  Choose 
0  <  A  <  T0. 

(1)  Fork  =  1,2,3-..,  lettk  =  kA  be  the  starting  time  of 
kth  agent.  Let  uk(t)  =  0,Xk(t)  =  xq  for  0  <t  <tk. 

(2)  For  all  t  >  tk,  calculate  u^{r)  for  all  t  E  [4,4  + 
Tk\  such  that  f{xk{r),  u^(r))  =  xt(r),  and  xt(r) 
achieves 

r)(xk(t),xk-i(t),i),  if  xk-i(t)  Sq 
r]Q{xk (t),t),  if  xk-! (t)  E  Sq 
where  t  E  [t,t +  Y* (xk(t),  xk-i(t))\  ifxk-!(t)  Sq 
orr  E  [t,t  +  T^(xk(t),  Sq)]  ifxk-i(t)  E  Sq 

(3)  Apply  uk(t)  =  u*( 0)  to  the  kth  agent. 

(4)  Repeat  from  step  2,  until  the  kth  agent  reaches  Sq. 

When  discussing  pairs  of  agents  during  pursuit,  the  (k  — 
l)th  agent  is  designated  as  the  “leader”  and  the  kth  agent 
as  the  “follower”.  As  Step  2  of  the  algorithm  indicates, 
there  are  two  types  of  follower  movements,  “catching  up” 
and  “free  running”,  depending  on  whether  the  leader 
has  reached  the  final  constraint  set  Sq.  The  former  type 
lets  agents  “learn”  from  their  leaders,  while  the  “free 
running”  stage  enables  them  to  find  the  optimal  final 
state  within  Sq  once  they  are  close  enough  to  that  set. 
Both  stages  will  be  essential  in  order  for  the  group  to 
solve  Problem  1. 

Note  that  modified  continuous  local  pursuit  (mCLP)  re¬ 
quires  each  follower  to  continuously  update  its  move¬ 
ment  (via  sensing  and  computing)  to  catch  up  with  its 
leader  during  the  pursuit  process.  Continuous  pursuit 
may  imply  a  significant  computational  burden  for  each 
agent,  especially  in  cases  where  the  optimal  trajectories 
“linking”  follower  and  leader  cannot  be  written  down  in 
closed  form.  For  instances  of  Problem  1  where  for  each 
follower  the  optimal  time  to  reach  the  leader  is  lower 
bounded  for  all  time,  then  it  is  possible  to  alter  the  pre¬ 
vious  algorithm  so  that  each  agent  only  performs  a  finite 
number  of  updating  as  it  evolves  from  xo  to  Sq.  This 
is  done  by  defining  a  modified  “sampled  local  pursuit” 
policy,  similar  to  that  used  in  [8]  for  fixed  final  state 
problems: 

Algorithm  2  ( Sampled  Local  Pursuit):  Identify  the 
starting  state  xq  on  D  and  the  constraint  set  Sq.  Let 
xq (£),£  E  [0, Tq]  be  an  initial  trajectory  satisfying  (1) 
with  xo(0)  =  xo,  Q(xq(Tq))  =  0.  Choose  the  pursuit 
interval  A  such  that  0  <  A  <  Tq  . 
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(1)  For  k  =  1,2,3...;  let  tk  =  kA  be  the  starting  time 
of  the  kth  agent,  i.e.  uk(t )  =  0,  xk(t)  =  xo  for 
0  <t<tk. 

(2)  Choose  the  updating  interval  Si  <  min(A,r*_1), 
where  T*_x  is  the  optimal  final  time  of  the  last  up¬ 
date  defined  by  Eq.  (3)  or  (4),  and  denote  Tlj  = 
A  for  convenience.  When  t\  =  t jjT1  +  5i+ = 
tk,  i  —  0, 1,  2,  3, . . .,  calculate  the  control  u*(r)  that 
achieves  (subj.  to  (1)): 

r](xk(t),xk-i(t),t),  if  xk-i(t)  £  Sq 

rjQ{xk (£),£),  if  xk-i (t)  G  Sq 
wherer  G  \t,t +  T* (xk(t),  xk-i{t))}  ifxk-i(t)  £  Sq 
orr  G  [t,t  +  r%(xk(t),SQ)]  ifxk-i(t)  G  Sq 

(3)  Apply  uk(t )  =  u*i  (t  —  t\)  to  the  kth  agent  for  t  G 

I4>4+1)- 

(4)  Repeat  from  step  2  until  the  kth  agent  reaches  Sq. 

Under  modified  sampled  local  pursuit  (mSLP)  each 
agent  executes  a  finite  number  of  “updates”  of  its  tra¬ 
jectory,  once  every  S  <  A  time  units.  mSLP’s  reduced 
computational  demands  make  it  attractive  in  cases 
where  the  complexity  of  the  agents’  dynamics  as  well  as 
that  of  the  environment  they  evolve  in  make  necessitate 
the  use  of  numerical  methods  for  finding  optimal  tra¬ 
jectories.  In  fact,  the  sampled  version  of  local  pursuit 
algorithm  can  itself  be  useful  as  a  numerical  method 
for  computing  optimal  controls.  A  full  analysis  of  local 
pursuit  in  that  light  is  currently  under  way. 

In  the  algorithms  defined  above,  we  assume  that  each 
follower  does  not  intercept  its  leader.  If  an  interception 
does  occur,  the  follower  will  “join”  its  leader  by  repeat¬ 
ing  the  leader’s  trajectory  after  the  time  of  interception. 
Because  the  initial  agent  travels  along  its  trajectory  for 
T0  units  of  time  and  the  pursuit  interval  A  is  finite,  there 
will  be  a  finite  number  of  such  events  whose  existence 
will  not  affect  the  results  discussed  below. 


3  Main  Results 

In  this  section  we  explore  the  behavior  of  the  group  (1) 
under  continuous  local  pursuit  (mCLP).  Although  we 
will  not  do  so  here,  similar  results  can  be  derived  for 
sampled  local  pursuit  (mCLP),  using  [8]  as  a  starting 
point. 

mCLP  defines  an  ordered  sequence  of  trajectories 
{xk(t)}.  This  section  will  first  investigate  the  conver¬ 
gence  of  the  trajectories’  cost,  and  then  will  show  that 
the  trajectories  themselves  converge  to  a  local  optimum. 
It  will  be  convenient  to  distinguish  between  the  planned 
trajectory ,  denoted  by  x(t),  that  a  follower  computes 
at  every  point  in  time  in  order  to  reach  its  leader,  and 
the  realized  trajectory ,  denoted  by  x(t),  along  which  the 
follower  actually  evolves. 


Lemma  1  Consider  a  leader-follower  pair  evolving  un¬ 
der  mCLP  with  a  pursuit  interval  A.  Let  the  leader’s 
trajectory  be  xk-i(t)  (t  G  [tk-i,tk-i  +  Tk- 1])  and  fix 
A  G  [0,  Tk~i).  Suppose  the  follower  updates  its  trajectory 
only  once  during  [tk,tk  +  Tk\  as  described  next: 

•  If  A  <  Tk- 1  —  A,  the  follower  moves  along  the  optimal 
trajectory  (in  the  sense  of  (3))  joining  xk(tk  +  A)  and 
xk-i(tk  +  A)  with  optimal  final  time  T  =  T*(xk(tk  + 
\),Xk-i(tk  +  A)).  During  other  times,  the  follower 
replicates  the  leader’s  trajectory,  i.e. 

Xk(t)  =  xk-i(t  -  A)  te[tk,tk  +  \] 

xk (t) =  xk~ i{t  —  r)  t  g  [tk  +  a  +  r,  t/-  +  /},• 

•  If  A  >  Tk- 1  —  A,  the  follower  evolves  along  the  optimal 
trajectory  from  xk(tk  +  A)  to  the  constraint  set  Sq  (in 
the  sense  of  (4))-  Similarly,  during  other  times 

Xk(t)  =  xk-i (t  -  A)  t  G  [tk,tk  +  A] 

Then  the  cost  along  the  follower’s  trajectory  will  be  no 
greater  than  the  leader’s. 


PROOF.  First,  choose  A  <  Tk- 1  —  A.  Starting  at  time 
tk  +  A  and  during  t  G  [tk  +  A,  tk  +  A  +  T],  the  follower 
moves  on  the  locally  optimal  trajectory  xk(t)  (see  Fig. 
1).  The  cost  along  xk  is 


Fig.  1.  Illustration  of  the  trajectory  obtained  by  a  single 
update  when  A  <  Tk-i  —  A. 


C(xk,tk,Tk)  — 

=  C(xkDki  A)  +  C{xk,tk  +  A  +  r,  Tk  —  A  —  r) 

+v(xk(tk  +  X),xk-i(tk  +  A),  tk  +  A) 

<  C(xk-i,tk-\,  A)  +  C(xk-i,tk-i  +  A,  A) 

-\~C  {xk—\ ,  tk-i  +  A  +  A,  Tk- 1  —  A  —  A) 

=  C(xk-1,tk-1,Tk_1)  (6) 

where  T  =  T*(xk{tk  +  X),xk-i (tk  +  A)). 
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Fig.  2.  Illustration  of  the  trajectory  obtained  by  a  single 
update  when  A  >  Tk~i  —  A. 

If  A  >  Tk~i  —  A  (see  Fig.  2),  the  cost  along  xk  is 


C(xk,tk,Tk)  — 

=  C(xk,tk ,  A)  +  r)Q(xk(tk  +  A),  4  +  A) 

<  C(xk-i,tk- 1,  A)  +  C{xk-i,tk-i  +  A, —  A) 

—  C  (xk—\ ,  4— i ,  Tk—\ ) 


Therefore  the  cost  along  the  follower’s  trajectory  is  no 
greater  than  the  leader’s,  n 


Now,  the  cost  of  the  iterative  trajectories  can  be  shown 
to  converge  under  mCLP: 

Lemma  2  (Convergence  of  Cost)  If  the  agents  (1) 
evolve  under  mCLP,  the  cost  of  the  iterated  trajectories 
converges. 


PROOF.  Let  Ck-i  be  the  cost  along  the  leader’s  tra¬ 
jectory  xk-i (t)  (t  G  [4-i,  4-1  +  Tk- 1]).  Define  a  tra¬ 
jectory  sequence  x\(t)  (t  G  [4, 4  +  T£\) ,  i  =  0, 1,  2  ... , 
whose  corresponding  costs  and  final  times  are  Ck  and 
T£,  as  follows:  let  xk(t)  =  xk-i (t)  (the  trajectory  of  a 
“leader”)  and  let  xk  (i  >  0)  be  the  trajectory  of  an  agent 
that  pursues  xfr1  by  performing  only  a  single  trajectory 
update ,  as  described  in  Lemma  1,  with  A  =  (i  — 1)5,  5>0 
(see  Fig.  3). 


Fig.  3.  Illustration  of  the  trajectory  sequence  xk(t).  Each  tra¬ 
jectory  is  obtained  by  a  single  update  upon  its  predecessor. 


Ck  is  bounded  below  for  fixed  k.  Thus,  Ck  <  Ck  1  and 
lim/c^oo  Ck  =  Cf®  exists  for  each  k.  Consequently, 

Coo  ^  ri 

k  A  -  Ofe-1 


Now,  take  5  =  Tk_\/i,  so  that  5  —>  0  as  i  — >  oo.  At 
the  limit,  the  trajectory  x^ft)  is  precisely  what  would 
be  obtained  by  an  agent  that  pursues  its  leader  xk-\, 
using  mCLP.  Hence,  the  follower’s  cost  is  Ck  =  C£°  < 
Ck- 1.  Because  the  sequence  {Ck}  is  non-increasing  and 
bounded  below  (there  exists  a  minimum  for  (2)),  it  must 
converge  to  a  limit.  □ 


To  proceed  to  the  main  theorem,  we  will  require  that  the 
optimal  cost  of  (2)  changes  “little”  for  small  changes  to 
the  endpoints  of  a  trajectory: 

Condition  1  Assume  for  a  generic  trajectory  x\ (t) 
there  exists  an  s  >  0  such  that  for  all  a,  4, 4  GD  and  all 
ft  >  0,  there  exists  a  trajectory  x^  (t)  such  that  the  cost 
C{x i,0,  T)  (xi(0)  =  a,  x\ (T)  =  bf)  from  a  to  b\  and  cost 
C(x2,0,T)  (#2(0)  =  a,  #2 CO  =  4)  from  a  to  4  satisfy 

H4-4||oo  \\c(x1,o,t)^c(x2,o,t)\\00<cq 

for  some  constant  C ,  independent  of  It. 

Then  the  next  lemma  holds: 

Lemma  3  Let  x*(t)  be  a  trajectory  of  (1)  such  that:  i) 
x*(t)  ( t  G  [0,4  +  Ai])  is  optimal  (in  the  sense  of  (3)) 
fromx*( 0)  to  x* (ti Ai) ,  and  ii)  x*(t)  (t  G  [ti,T*])  is 
optimal  (in  the  sense  of  (4))  from  x*  (4)  to  the  constraint 
set  Sq.  Assume  Condition  1  is  satisfied  and  0  <  4  < 
4  +  Ai  <  T*.  Then  the  trajectory  x*(t)  ( t  G  [0,  T*])  is 
a  local  minimum  of  (4)  from  x*(0)  to  SQ. 

PROOF.  Choose  0  <  A  <  Ai.  From  the  principle  of 
optimality,  x*(t)  (t  G  [0,4  +  A])  and  x*(t)  (t  G  [ti,T*]) 
are  each  locally  optimal  with  respect  to  their  corre¬ 
sponding  end  points.  Suppose  ||x*(4  +  A)  —  s||oo  >  E\ 
for  any  s  G  Sq  and  that  x*(t)  (t  G  [0,  T*])  is  not 
a  local  minimum.  There  must  exist  e  <  min(£,  £i/2) 
(where  5  is  defined  in  Condition  1)  and  another  opti¬ 
mum  x(t)  G  B  x  [0,T]  satisfying  || x(t)  —  £*(t)||oo  <  e 
and  C(x(t),0,T)  <  C(x*{t),  0,  T*). 

Notice  that  ||x(4  +  A)  —  s||oo  >  e  for  any  s  G  Sq.  Con¬ 
struct  two  trajectories  yi (t),  7/2 (t)  (t  G  [4,4  +  A])  that 
connect  x(t)  and  x*(t)  (see  Fig.  4)  and  satisfy  Condi¬ 
tion  1  (with  x*  or  x  playing  the  role  of  x\,  and  yi  or 
y2  standing  in  for  #2).  In  particular,  let  yi,  7/2  be  such 

that  x*(4)  =  2/2(4), a*  (4  +  A)  =  2/1(4  +  A),x(4)  = 
Vi  (4 ),  x(ti  +  A)  =  2/2  (4  +  A) .  Now  , Condition  1  implies 
that 


From  Lemma  1,  the  cost  of  each  follower’s  trajectory  C{y\(t),  4,  A)  <C{x(t),t  1,  A)  +  CA 

will  be  no  greater  than  the  leader’s.  Also,  the  sequence  C7 (2/2  (^)  ?  4,  A)  <  C(x*(t),t  1,  A)  +  CA 
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(7) 


Fig.  4.  Illustrating  the  proof  of  Lemma  3:  “overlapping”  op¬ 
timal  trajectories  form  a  locally  optimal  trajectory. 

Because  x*(t)  (t  G  [0,  ti  +  A])  and  x*(t)  (t  G  [£i,T*]) 
are  each  locally  optimal,  the  following  holds: 

C(x*(t),0,ti)  +  C(x*(t),tu  A)  (8) 

<  C(x(t),0,ti)  +  A),  and 


C(x*{t),h,A)  #  C(x*(t),t i  +  A,  X*  -  ti  -  A) 

<  C(x(t),t i  +  A,T  -  ti  -  A)  +  C(y2(t),ti,A)  (9) 

Combining  (7)  with  (8,9)  leads  to 

C(x*(t),Q,T)  <  C(x(t),  0,  X)  +  2CA  (10) 


The  cost  C (x(t) ,  0,  T)  is  apparently  less  than  C (x*  (t) ,  0,  T) ; 
but  if  A  is  chosen  so  that 

A  ^  C(x*(t),0,T)-C(x(t),0,T) 


then  (10)  cannot  hold.  This  is  a  contradiction,  because 
A  could  be  chosen  arbitrarily  small.  It  follows  that 
x*(t)  ( t  G  [0,T*])  must  be  a  local  minimum.  □ 


Assume  that  the  locally  optimal  trajectory  from  the  fol¬ 
lower  to  the  leader  (or  to  Sq)  is  unique  at  all  times.  This 
assumption  is  generally  satisfied  if  pursuit  is  restricted 
to  take  place  within  a  “small”  region  (setting  A  small) , 
i.e.  agents  follow  “close”  to  one  another.  Then,  conver¬ 
gence  of  the  trajectories’  cost  also  implies  convergence 
of  the  trajectories  themselves: 

Lemma  4  If  at  all  times  during  mCLP,  the  locally  op¬ 
timal  trajectory  from  follower  to  leader  (or  to  Sq)  is 
unique,  then  mCLP  converges  to  a  limiting  trajectory 

Xoo  (^)  • 


PROOF.  Suppose  that  the  trajectories’  cost  converges 
but  that  there  exist  more  than  one  limiting  trajectory. 
Let  x\ (t)  ( t  G  [0,Ti])  and  x2 (t)  ( t  G  [0,T2])  be  two  such 
possibilities.  Let  t\  G  [0,  Ti]  be  the  earliest  time  that 


x\ (t)  differs  from  x2(t).  From  Lemma  2,  x\  and  x2  must 
have  the  same  cost,  otherwise  convergence  of  the  cost  is 
contradicted.  Suppose  that  a  leader  Xk-i  (t)  travels  along 


Fig.  5.  Illustrating  the  proof  of  Lemma  4:  pursuit  between 
agents  moving  on  two  supposed  “limiting”  equal-cost  tra¬ 
jectories,  leads  to  the  conclusion  that  the  cost  along  the  fol¬ 
lower’s  trajectory  is  less  than  that  along  the  leader’s. 

xi  (t),  while  a  follower  Xk(t)  travels  along  x2(t).  Choose 
h  >  0  small,  and  that  a  series  of  sampled  updates  occur 
at  t\  +  ih  (i  =  1,  2  . . . ,  n  =s  (Xi  —  t\  —  A)/ h),  as  Fig.  5 
indicates. 

Consider  the  update  occurring  at  t\,  the  follower  moves 
on  X2 (t),t  G  [t\ ,  t\  +h)  after  this  update.  This  fact  means 
either  that  the  trajectory  passing  x2(t),t  G  [t\,ti  +  h) 
and  the  optimal  trajectory  from  x2  (ti  +  h)  to  x\  (t\  +  A) 
(as  indicated  by  the  left  dashed  line  in  Fig.  5)  has  less 
cost  than  x\ (£),£  G  [ti,t%  +  A),  or  it  has  the  same  cost 
with  x\ (£),£  G  [t\,ti  +  A).  The  latter  is  contradict  to 
the  assumption  that  there  only  exists  a  unique  locally 
optimal  trajectory  from  follower  to  leader  at  any  time. 
Therefore  the  locally  optimal  trajectory  that  the  follower 
actually  calculated  at  t\  has  the  cost  of  C(x2,ti,/i)  + 
r)(x2(ti  4-  h),x1(t1  +  A),  t\  H-  h),  and 

C{x2,  t\,h)  +  rj{x2{t\  +  h),  x\ (ti  +  A),  t\  +  h) 

<C{x  i,ti,A)  (11) 


Similarly,  investigate  the  update  occurring  at  t\  = 
ih(i  =  2, 3, . . . ,  n),  and  we  obtained 

C(x2,ti  +  ft,  h) 

2(ti  T  2 ft),  x\(ti  +  A  +  ft),  ti  +  2 ft) 

<  rj(x2{ti  +  h),x1(t1  +  A), ti  +  ft) 

-\-C(xi,  ti  +  A,  ft)  (12) 


C(x2,ti  +  (n  -  1  )ft) 

+V(x2(ti  +  nh),xi(Ti),ti  +  nh ) 

<  y(x2{ti  +  (n  -  l)ft),xi(Ti  -  ft),  ti  +  (n  -  1  )ft) 
+C(x1,T1  —  ft,  ft)  (13) 


And  at  the  last  update  step,  the  follower  will  choose  the 
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locally  optimal  trajectory  from  itself  to  Sq : 

C {x2 , +  nh ,  T2  —  t\  —  nh ) 

<  v(x2(ti  +  +  nh)  (14) 

Notice  the  inequality  does  not  depends  on  the  size  of  h. 
No  matter  how  small  h  is,  it  can  always  be  concluded 
from  (11)~ (14)  that 

C(x2,t1,T2  -  4)  <  C(xi,ti,T\  -  4)  (15) 

If  we  let  h  — >  0,  the  above  sampled  process  will  approach 
the  continuous  local  pursuit.  However,  the  result  of  (15) 
does  not  change  with  the  decrease  of  ft,  therefore  the 
cost  along  x2(t)  must  be  less  than  that  for  x\  (t)  under 
mCLP,  which  contradicts  the  convergence  of  the  cost 
under  mCLP.  □ 

Lemma  5  Along  the  limiting  trajectory  produced  under 
mCLP,  the  planned  trajectories  Xk(t)  and  realized  trajec¬ 
tories  Xk(t)  overlap,  i.e.  Xk(t)  =  Xk  (t).  Furthermore,  if 
the  locally  optimal  trajectories  obtained  at  every  updat¬ 
ing  time  are  smooth,  then  the  limiting  trajectory  is  also 
smooth. 


PROOF.  Suppose  that  a  leader,  Xk-i  evolves  along  the 
limiting  trajectory  Xoo(t).  Lemma  4  then  implies  that 
Xk-1  (t)  =  Xk(t  +  A)  for  Vf  G  [4,4  +  Tk}. 


Fig.  6.  Differences  between  the  planned  and  realized  trajecto¬ 
ries  contradict  the  convergence  of  trajectories  under  mCLP. 

Suppose  that  with  the  leader  at  x^-iifi  +  T(ti)),  where 
r(£i)  is  the  best  final  time  for  update  at  t\,  and  follower 
at  Xk(t\),  the  planned  trajectory  xtl{t)  (t  G  [t\,t\  + 
r(ti))  obtained  at  t\  differs  from  Xk(t)  (t  G  [t\,  ti+T(ti)) 
starting  at  some  time  t2  >  t\.  Furthermore,  let  Xt1(t i  + 
r(ti))  =  x(ti  +  r(£i)).  Because  the  planned  trajectory 
xtl  ( t )  is  unique  (by  assumption)  and  optimal, 

C(xtl ,  t2,  f  (4)  -  (t2  -  4))  <  C(xk,  t2,  T(4)  -  (t2  -  4)) 
Construct  the  trajectory 

xtl{t)  t  G  [t2,ti  +f (H)) 

xk(t  -  f(ti)  +  r(ti))  t  e[t i  +  t(ti),t2  +  r(t2)] 


Clearly,  x  has  lower  cost  than  Xk(t)  (t  G  [t2,  t2  +  r(t2)]) 
(See  Fig.  6).  Thus,  under  mCLP,  the  follower  would  have 
taken  x  (or  another  trajectory  with  even  lower  cost)  over 
xk(t)  (t  G  [t2,t2  +  r(t2)]).  This  contradicts  the  conver¬ 
gence  to  a  limiting  trajectory.  The  same  argument  can 
be  applied  at  any  other  updating  time,  so  that  it  can  be 
concluded  that  x(t)  =  Xk(t)  (t  G  [0 ,  T^]). 

Recall  that  Xk(t)  is  smooth  for  t  G  [ti,t\  +  T(ti)],  be¬ 
cause  the  locally  optimal  trajectories  linking  follower 
and  leader  are  smooth  by  assumption.  Similarly,  Xk  (t)  is 
smooth  fort  G  [t2?  t2+T(t2)]  for  any  t\  <  t2  <  ti  +  T(ti). 
Therefore,  Xk(t)(t  G  [ti,  t2  +T(t2)])  is  smooth.  Repeated 
applications  of  this  argument  lead  to  the  conclusion  that 
the  entire  trajectory  Xk(t)  (t  G  [0,  T^])  is  smooth.  □ 


The  next  theorem  is  an  immediate  consequence  of  Lem¬ 
mas  1^5: 

Theorem  1  Suppose  that  the  group  of  (1)  evolves  under 
mCLP  and  that  at  all  times  t,  the  locally  optimal  trajec¬ 
tories  from  follower  to  leader  are  unique.  Then,  the  lim¬ 
iting  trajectory  is  unique  and  locally  optimal.  It  is  also 
smooth,  if  the  locally  optimal  trajectories  calculated  at 
every  updating  time  are  smooth. 


PROOF.  From  Lemma  4,  the  limiting  trajectory  is 
unique.  It  follows  that  Xk-i  (t  —  A)  =  Xk(t)  if  Xk-i  (t)  = 
x oo (t  —  tk~ i).  Choose  5i,52  such  that  0  <  Si  <  £2  <  T 
for  all  optimal  final  times  T  of  the  planned  trajecto¬ 
ries  Xk  generated  during  mCLP.  The  limiting  trajec¬ 
tory  Xoq  is  piecewise  smooth  and  locally  optimal  for 
t  G  [tk  +  idi,  tk  +  iSi  +  52],  i  =  0, 1,  2  . . .  because  it  coin¬ 
cides  with  the  planned  trajectories  Xk(t).  From  Lemma 
3  -  in  this  case  Sq  is  a  single  point  -  it  can  be  con¬ 
cluded  that  Xk(t)  ( t  G  [tk,tk  +  Si  +  52])  is  optimal  be¬ 
cause  it  is  the  composition  of  two  overlapping  locally  op¬ 
timal  trajectories,  Xk(t)  (t  G  [tkAk  +  ^2])  and  Xk(t)  (t  G 
[tk  +  Si,  tk  +  Si  +  52]).  ^From  successive  applications  of 
this  argument  (i  =  2,  3, . . .),  we  conclude  that  Xoo(t)  is 
locally  optimal.  Smoothness  of  x ^  is  proved  via  a  simi¬ 
lar  “piece  by  piece”  argument.  □ 


3. 1  Remarks 

Local  pursuit  is  a  cooperative,  decentralized  algorithm 
for  learning  optimal  controls/trajectories,  starting  from 
a  feasible  solution.  Each  agent  is  only  required  to  calcu¬ 
late  optimal  trajectories  from  its  own  state  to  that  of  its 
nearby  leader.  Because  agents  are  separated  by  A  time 
units  as  they  leave  x$,  each  agent  relies  on  local  informa¬ 
tion  only  in  order  to  follow  its  predecessor,  and  requires 
no  knowledge  of  the  global  geometry.  Therefore  there  is 
no  need  for  agents  to  exchange  or  “fuse”  local  maps  that 
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they  obtain  individually.  Agents  do  not  need  to  commu¬ 
nicate  their  choice  of  coordinate  systems  as  they  evolve, 
nor  do  they  need  to  know  the  coordinates  of  Xf.  While 
it  is  possible  that  a  group  of  agents  could  disperse  and 
construct  a  global  map  from  local  information,  such  an 
approach  might  require  significantly  more  computation 
and  communication  than  local  pursuit.  The  latter  solves 
the  optimal  control  problem  in  many  “short  pieces”, 
which  makes  it  no  need  to  compute  the  optimum  over 
the  whole  environment.  Thus  local  pursuit  is  appropri¬ 
ate  for  systems  with  short-range  sensors  (for  example, 
in  the  case  of  a  swarm  of  robots  exploring  unknown  ter¬ 
rain),  and  optimal  control  problems  which  are  easier  to 
solve  over  “short”  distances. 

The  local  pursuit  algorithms  assumed  a  countable  infin¬ 
ity  of  agents;  of  course,  such  a  collection  cannot  be  re¬ 
alized.  It  is  however  possible  to  achieve  the  same  results 
with  a  finite  number  of  agents  that  apply  local  pursuit 
to  reach  the  final  constraint  set  Sq  from  xq,  then  return 
to  xq  along  the  obtained  path.  The  required  modifica¬ 
tions  are  straightforward  but  will  not  be  discussed  here 
as  they  are  beyond  the  scope  of  this  report.  An  experi¬ 
ment  that  uses  this  technique  is  detailed  in  [8].  Finally, 
local  pursuit  is  not  guaranteed  to  converge  to  the  global 
optimum.  The  choice  of  agent  separation  A  can  affect 
whether  the  limiting  trajectory  is  a  local  or  a  global  opti¬ 
mum.  Some  interesting  cases  involving  spaces  with  holes 
or  obstacles  are  discussed  in  [8,14]. 

4  Simulations  and  Experiments 

In  this  section,  we  describe  a  series  of  simulations  and 
an  experiment  desinged  to  illustrate  the  performance  of 
local  pursuit. 

4-1  A  trail  optimization  problem  with  free  final  states 

Consider  the  problem  of  finding  shortest  paths  in  an  en¬ 
vironment  consisting  of  a  plane  with  two  right  cones, 
whose  top  view  was  shown  in  Fig.  7.  The  radii  and 
heights  of  the  cone  were  800  and  1000  units  of  length, 
respectively.  Each  object  (the  plane  and  each  cone)  was 
parametrized  with  its  own  set  of  coordinate  functions. 
The  agents  were  governed  by  x k  =  u k,  \\uk\\  =  1  and 
were  required  to  travel  from  xo  =  (3500,  0,  0)  to  the  sec¬ 
ond  cone. 

Fig.  7  shows  the  iterated  trajectories  generated  by  a  col¬ 
lection  of  systems  implementing  the  mCLP  policy  with 
T0  =  3499,  A  =  0.2T0.  For  the  computation  of  the  opti¬ 
mal  trajectory,  each  agent  had  to  solve  its  own  optimal 
control  problem  which  was  simpler  than  the  “global” 
problem,  partly  due  to  the  fact  that  the  optimal  tra¬ 
jectory  crosses  multiple  coordinate  patches  as  it  crosses 
from  the  plane  to  the  cones  and  vise  versa.  When  leader 
and  follower  were  both  on  the  plane  or  on  the  same  cone, 


the  computation  of  optimal  trajectories  was  straightfor¬ 
ward.  In  other  cases,  agents  had  to  compute  optimal  tra¬ 
jectories  that  crossed  between  at  most  two  coordinate 
patches  (plane-to-cone  or  cone-to-plane) .  On  the  other 
hand,  computing  the  optimal  trajectory  at  once  would 
require  searching  over  a  four-parameter  family  of  curves 
(there  are  a  total  of  four  “crossings”  between  coordinate 
sets).  A  thorough  accounting  of  the  computational  re¬ 
quirements  and  numerical  performance  of  local  pursuit 
will  be  forthcoming. 
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Fig.  7.  Continuous  local  pursuit  in  a  complex  environment. 
The  initial  trajectory  (along  the  borders  of  the  cones)  is  eas¬ 
ily  described  but  far  away  from  optimal.  The  locally  optimal 
trajectories  were  easier  to  compute  than  the  global  optimum 
because  of  the  limited  pursuit  distance  (A  =  0.2To).  The 
iterated  trajectories  converged  to  the  optimum. 


4-2  Minimum-time  control  with  limited  acceleration 
and  speed 

Next,  consider  the  minimum-time  control  of  the  second- 
order  system 

x  =  u;  s.t.  \u\  <  30,  \x\  <  8 


We  want  to  minimize  J(x,x,  0)  =  T,  with  the  bound¬ 
ary  conditions  x(0)  =  x{T)  =  0,  x(0)  =  0  and  x(T) 
fixed  (in  this  simulation,  x(T)  is  determined  by  the  in¬ 
put  to  the  first  agent).  Here  the  constraint  set  Sq  is  a 
single  point  in  the  state  space.  The  optimal  control  pol¬ 
icy  is  similar  to  the  well-known  ‘bang-bang”  control:  the 
control  u  switches  at  most  once  between  30  and  —30, 
and  u  =  0  when  the  maximum  or  minimum  speed  x 
has  been  reached.  The  initial,  suboptimal  input  (Agent 
1  in  Fig.  8),  alternated  between  the  maximum  and  min¬ 
imum  available  acceleration.  When  using  mCLP  with 
A  =  1.3sec,  the  third  agent’s  trajectory  was  optimal, 


Agent  1 


Agent  2 


Agent  3 


Fig.  8.  Iterative  trajectories  for  minimum  control  with  lim¬ 
ited  acceleration  and  speed.  The  simulated  control  loop  ran 
at  a  frequency  of  2000 Hz  so  that  the  control  policy  could 
be  regarded  as  approximately  mCLP.  The  pursuit  interval 
was  A  =  1.3.  Units  for  acceleration,  velocity  and  position 
are  Rad/ s2 ,  Rad/ s,  Rad,  respectively. 


see  Fig.  8  for  illustration.  Notice  that  after  t  >  2.1  sec 
the  second  agent  intercepted  the  first  and  subsequently 
moved  along  the  same  trajectory  x\.  It  is  also  interesting 
to  note  that  in  this  case,  optimality  was  achieved  after 
a  finite  number  of  iterations. 


4-3  Experiment  on  minimum-time  control  with  accel¬ 
eration  and  speed  constraints 


Motor  1  Motor  2  Motor  3 


Fig.  9.  Applying  local  pursuit  with  a  trio  of  motors  to  obtain 
minimum-time  control  with  limited  acceleration  and  speed. 

We  implemented  the  example  of  Sec.  2.4  using  a  collec¬ 
tion  of  three  motors,  shown  in  Fig.  9.  Each  motor  was 
equipped  with  position  and  speed  sensors,  which  were 
sampled  by  a  PC-based  controller  at  a  rate  of  2000 Hz. 
The  goal  was  to  rotate  the  motors  to  a  fixed  final  po¬ 
sition  in  minimum  time.  Motor  acceleration  and  speed 
were  limited  to  30  rad/ sec?  and  Sr  ad /sec,  respectively. 

The  input  to  the  first  motor  was  a  rectangular  pulse  with 
amplitude  equal  to  the  maximum  acceleration  (same  as 
in  the  simulation  of  Sec.  2.4).  Each  of  the  remaining  two 
motors  tried  to  “catch  up”  with  its  predecessor  by  reach¬ 
ing  the  predecessor’s  state  minimum  time.  The  trajec¬ 
tories  of  all  three  motors  with  A  =  1.3sec  are  shown  in 
Fig.  10.  We  see  that  the  third  motor  evolved  under  es¬ 
sentially  optimal  control,  and  the  second  motor  “inter¬ 
cepted”  the  first  after  t  «  2.3sec. 

Because  of  unmodeled  friction,  the  final  position  0(T) 
was  less  than  the  nominal  value  (see  x{T)  in  the  last  sim¬ 
ulation).  Friction  also  caused  the  motors  to  decelerate 
when  a  zero  input  was  applied  (once  the  motors  reached 
maximum  speed) .  In  turn,  that  deceleration  caused  the 
mCLP  policy  to  try  and  catch  up  by  introducing  a  pos¬ 
itive  control  input,  resulting  in  chatter  observed  in  the 
velocity  and  acceleration  curves  of  motors  2  and  3  in 
Fig.  10. 

5  Conclusions  and  ongoing  work 

This  report  explored  a  biologically-inspired  coopera¬ 
tive  strategy  (termed  “Local  Pursuit”)  for  solving  a 
class  of  optimal  control  problems  with  free  final  time 
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Fig.  10.  Iterative  trajectories  of  motors  when  applying  local 
pursuit  to  attain  minimum-time  control  with  limited  accel¬ 
eration  and  speed.  The  pursuit  interval  A  =  1.3.  The  third 
motor  evolved  under  essentially  optimal  control. 

and  partially-constrained  final  state.  The  proposed  al¬ 
gorithms  generalizes  previous  models  that  mimic  the 
foraging  behavior  of  ant  colonies  and  allows  a  collective 
to  discover  optimal  controls,  starting  from  an  initial 
suboptimal  solution.  Members  of  the  collective  are  only 
required  to  obtain  local  information  on  their  environ¬ 


ment  and  to  calculate  optimal  trajectories  to  their 
nearby  neighbors.  The  local  pursuit  algorithm  relies  on 
cooperation  to  perform  a  task  which  would  be  difficult 
or  impossible  for  a  single  system  to  perform,  namely 
solving  an  optimal  control  problem  with  limited  infor¬ 
mation  (in  terms  of  coordinate  systems  that  describe 
the  environment  or  the  coordinates  of  the  final  state) 
and  short-range  sensing. 

Although  this  work  was  inspired  by  a  desire  to  explore 
the  limits  of  a  simple-to-formulate,  bio-inspired  control 
policy,  mCLP  and  especially  its  “sampled”  counterpart 
could  be  interesting  as  numerical  methods  for  computing 
optimal  controls.  Work  in  that  direction  is  ongoing. 
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